Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gcp sdk provided flow for obtaininng application default credenti… #430

Merged
merged 9 commits into from
Sep 20, 2024

Conversation

dbalabka
Copy link
Contributor

…als (#429)

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a great improvement.

  • Do we need to bump the minimum version of the google.auth dependency to support this?
  • Can you fix up the tests, specifically the skip when missing credentials?
  • Can you verify that tests pass locally for you?

@dbalabka
Copy link
Contributor Author

dbalabka commented Aug 8, 2024

@jacobtomlinson, fixing the test requires supporting an older version of google.auth. Also, I've noticed that my changes break the cluster creation. I've been working on this to fix it.

@dbalabka
Copy link
Contributor Author

dbalabka commented Aug 8, 2024

However, I see that the minimum required version supports the used method google.auth.default:
https://github.com/googleapis/google-auth-library-python/blob/v1.23.0/google/auth/_default.py#L251

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed things up in the CI here, but there are now conflicts. @dbalabka would you mind polishing this up and getting it ready for another review?

@dbalabka
Copy link
Contributor Author

dbalabka commented Sep 18, 2024

@jacobtomlinson , thanks for the reminder. I belive I can get back to the PR improvements next week. In our case, I've started to use a service account; however, it is not the best option for local development according to GCP recommendation because the service account isn't tight with the developer's account: https://cloud.google.com/docs/authentication/provide-credentials-adc#local-dev:~:text=when%20your%20code%20is%20running%20in%20a%20local%20development%20environment%2C%20such%20as%20a%20development%20workstation%2C%20the%20best%20option%20is%20to%20use%20the%20credentials%20associated%20with%20your%20user%20account.

@dbalabka
Copy link
Contributor Author

dbalabka commented Sep 19, 2024

@jacobtomlinson , seems GCP tests are all skipped in test env. I've tried to run cluster locally and get a weird error. Can it be the old image reason?

dask_cloudprovider/gcp/tests/test_gcp.py:100 (test_create_cluster_sync)
>   ???

/opt/conda/lib/python3.10/site-packages/distributed/scheduler.py:4846: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/conda/lib/python3.10/site-packages/distributed/protocol/serialize.py:452: in deserialize
    ???
/opt/conda/lib/python3.10/site-packages/distributed/protocol/serialize.py:111: in pickle_loads
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: code expected at most 16 arguments, got 18

/opt/conda/lib/python3.10/site-packages/distributed/protocol/pickle.py:92: TypeError

The above exception was the direct cause of the following exception:

    @pytest.mark.asyncio
    @pytest.mark.timeout(1200)
    @pytest.mark.external
    async def test_create_cluster_sync():
        skip_without_credentials()
    
        cluster = GCPCluster(zone="europe-west4-a", n_workers=1)
        client = Client(cluster)
    
        def inc(x):
            return x + 1
    
>       assert client.submit(inc, 10).result() == 11

test_gcp.py:113: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../.venv/lib/python3.11/site-packages/distributed/client.py:402: in result
    return self.client.sync(self._result, callback_timeout=timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Future: error, key: inc-cf34e1fb7d5c448b29abbaab83f72f6a>
raiseit = True

    async def _result(self, raiseit=True):
        await self._state.wait()
        if self.status == "error":
            exc = clean_exception(self._state.exception, self._state.traceback)
            if raiseit:
                typ, exc, tb = exc
>               raise exc.with_traceback(tb)
E               RuntimeError: Error during deserialization of the task graph. This frequently
E               occurs if the Scheduler and Client have different environments.
E               For more information, see
E               https://docs.dask.org/en/stable/deployment-considerations.html#consistent-software-environments

../../../.venv/lib/python3.11/site-packages/distributed/client.py:410: RuntimeError

@jacobtomlinson
Copy link
Member

Yeah quite possibly! But if you got that far in the test then the Dask cluster deployed successfully. We should dig into what is happening here (I assume you have a different Python version locally than the one in the container), but I don't want that to hold up this PR.

@jacobtomlinson jacobtomlinson merged commit 6eaf2db into dask:main Sep 20, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants